10 research outputs found

    Pseudo Label Selection is a Decision Problem

    Full text link
    Pseudo-Labeling is a simple and effective approach to semi-supervised learning. It requires criteria that guide the selection of pseudo-labeled data. The latter have been shown to crucially affect pseudo-labeling's generalization performance. Several such criteria exist and were proven to work reasonably well in practice. However, their performance often depends on the initial model fit on labeled data. Early overfitting can be propagated to the final model by choosing instances with overconfident but wrong predictions, often called confirmation bias. In two recent works, we demonstrate that pseudo-label selection (PLS) can be naturally embedded into decision theory. This paves the way for BPLS, a Bayesian framework for PLS that mitigates the issue of confirmation bias. At its heart is a novel selection criterion: an analytical approximation of the posterior predictive of pseudo-samples and labeled data. We derive this selection criterion by proving Bayes-optimality of this "pseudo posterior predictive". We empirically assess BPLS for generalized linear, non-parametric generalized additive models and Bayesian neural networks on simulated and real-world data. When faced with data prone to overfitting and thus a high chance of confirmation bias, BPLS outperforms traditional PLS methods. The decision-theoretic embedding further allows us to render PLS more robust towards the involved modeling assumptions. To achieve this goal, we introduce a multi-objective utility function. We demonstrate that the latter can be constructed to account for different sources of uncertainty and explore three examples: model selection, accumulation of errors and covariate shift.Comment: Accepted for presentation at the 46th German Conference on Artificial Intelligenc

    Pseudo-Label Selection: Insights From Decision Theory

    Get PDF

    An Empirical Study of Prior-Data Conflicts in Bayesian Neural Networks

    Get PDF
    Imprecise Probabilities (IP) allow for the representation of incomplete information. In the context of Bayesian statistics, this is achieved by generalized Bayesian inference, where a set of priors is used instead of a single prior [ 1 , Chapter 7.4]. The latter has been shown to be particularly useful in the case of prior-data conflict, where evidence from data (likelihood) contradicts prior information. In these practically highly relevant scenarios, classical (precise) probability models typically fail to adequately represent the uncertainty arising from this conflict. Generalized Bayesian inference by IP, however, was proven to handle these prior-data conflicts well when inference in canonical exponential families is considered [3]. Our study [2] aims at accessing the extent to which these problems of precise probability models are also present in Bayesian neural networks (BNNs). Unlike traditional neural networks, BNNs utilize stochastic weights that can be learned by updating the prior belief with the likelihood for each individual weight using Bayes’ rule. In light of this, we investigate the impact of prior selection on the posterior of BNNs in the context of prior-data conflict. While the literature often advocates for the use of normal priors centered around 0, the consequences of this choice remain unknown when the data suggests high values for the individual weights. For this purpose, we designed synthetic datasets which were generated using neural networks (NN) with fixed high-weight values. This approach enables us to measure the effect of prior-data conflict, as well as reduce the model uncertainty by knowing the exact weights and functional relationship. We utilized BNNs that use the Mean-Field Variational Inference (MFVI) approach, which has not only seen an increasing interest due to its scalability but also allows analytical computation of the posterior distributions, as opposed to simulation-based methods like Markov Chain Monte Carlo (MCMC). In MFVI, the posterior distribution is approximated by a tractable distribution with a factorized form. In our work [ 2, Chapter 4.2], we provide evidence that exact priors centered around the exact weights, which are known from the neural network (NN), outperform their inexact counterparts centered around zero in terms of predictive accuracy, data efficiency and reasonable uncertainty estimations. These results directly imply that selecting a prior centered around 0 may be unintentionally informative, as previously noted by [ 4], resulting in significant losses in prediction accuracy and data requirement, rendering uncertainty estimation impractical. BNNs learned under prior-data conflict resulted in posterior means that were a weighted average of the prior mean and the likelihood highest probability values and therefore exhibited significant differences from the correct weights while also exhibiting an unreasonably low posterior variance, indicating a high degree of certainty in their estimates. Varying the prior variance yielded similar observations, with models using priors with data conflict exhibiting overconfidence in their posterior estimates compared to those using exact priors. To investigate the potential of IP methods, we are currently conducting the effect of expectation- valued interval- parameter, to generate resonable uncertainty predictions. Overall, our preliminary results show that classical BNNs produce overly confident but erroneous predictions in the presence of prior-data conflict. These findings motivate using IP methods in Deep Learning

    Interpreting Generalized Bayesian Inference by Generalized Bayesian Inference

    Get PDF
    The concept of safe Bayesian inference [ 4] with learning rates [5 ] has recently sparked a lot of research, e.g. in the context of generalized linear models [ 2]. It is occasionally also referred to as generalized Bayesian inference, e.g. in [2 , page 1] – a fact that should let IP advocates sit up straight and take notice, as this term is commonly used to describe Bayesian updating of credal sets. On this poster, we demonstrate that this reminiscence extends beyond terminology

    Robust Statistical Comparison of Random Variables with Locally Varying Scale of Measurement

    Full text link
    Spaces with locally varying scale of measurement, like multidimensional structures with differently scaled dimensions, are pretty common in statistics and machine learning. Nevertheless, it is still understood as an open question how to exploit the entire information encoded in them properly. We address this problem by considering an order based on (sets of) expectations of random variables mapping into such non-standard spaces. This order contains stochastic dominance and expectation order as extreme cases when no, or respectively perfect, cardinal structure is given. We derive a (regularized) statistical test for our proposed generalized stochastic dominance (GSD) order, operationalize it by linear optimization, and robustify it by imprecise probability models. Our findings are illustrated with data from multidimensional poverty measurement, finance, and medicine.Comment: Accepted for the 39th Conference on Uncertainty in Artificial Intelligence (UAI 2023

    Not All Data Are Created Equal: Lessons From Sampling Theory For Adaptive Machine Learning

    Get PDF
    In survey methodology, inverse probability weighted (Horvitz-Thompson) estimation has become an indispensable part of statistical inference. This is triggered by the need to deal with complex samples, that is, non-identically distributed data. The general idea is that weighting observations inversely to their probability of being included in the sample produces unbiased estimators with reduced variance. In this work, we argue that complex samples are subtly ubiquitous in two promising subfields of data science: Self-Training in Semi-Supervised Learning (SSL) and Bayesian Optimization (BO). Both methods rely on refitting learners to artificially enhanced training data. These enhancements are based on pre-defined criteria to select data points rendering some data more likely to be added than others. We experimentally analyze the distance from the so-produced complex samples to i.i.d. samples by Kullback-Leibler divergence and maximum mean discrepancy. What is more, we propose to handle such samples by inverse probability weighting. This requires estimation of inclusion probabilities. Unlike for some observational survey data, however, this is not a major issue since we excitingly have tons of explicit information on the inclusion mechanism. After all, we generate the data ourselves by means of the selection criteria. To make things more tangible, consider the case of BO first. It optimizes an unknown function by iteratively approximating it through a surrogate model, whose mean and standard error estimates are scalarized to a selection criterion. The arguments of this criterion's optima are evaluated and added to the training data. We propose to weight them by means of the surrogate model's standard errors at time of selection. For the case of deploying random forests as surrogate models, we refit them by weighted drawing in the bootstrap sampling step. Refitting may be done iteratively aiming at speeding up the optimization or after convergence aiming at providing applicants with a (global) interpretable surrogate model. Similarly, self-training in SSL selects instances from a set of unlabeled data, predicts its labels and adds these pseudo-labeled data to the training data. Instances are selected according to a confidence measure, e.g. the predictive variance. Regions in the feature space where the model is very confident are thus over-represented in the selected sample. We again explicitly exploit the selection criteria to define weights which we use for resampling-based refitting of the model. Somewhat counter-intuitively, the more confident the model is in the self-assigned labels, the lower their weights should be to counteract the selection bias. Preliminary results suggest this can increase generalization performance

    Approximately Bayes-Optimal Pseudo Label Selection

    Full text link
    Semi-supervised learning by self-training heavily relies on pseudo-label selection (PLS). The selection often depends on the initial model fit on labeled data. Early overfitting might thus be propagated to the final model by selecting instances with overconfident but erroneous predictions, often referred to as confirmation bias. This paper introduces BPLS, a Bayesian framework for PLS that aims to mitigate this issue. At its core lies a criterion for selecting instances to label: an analytical approximation of the posterior predictive of pseudo-samples. We derive this selection criterion by proving Bayes optimality of the posterior predictive of pseudo-samples. We further overcome computational hurdles by approximating the criterion analytically. Its relation to the marginal likelihood allows us to come up with an approximation based on Laplace's method and the Gaussian integral. We empirically assess BPLS for parametric generalized linear and non-parametric generalized additive models on simulated and real-world data. When faced with high-dimensional data prone to overfitting, BPLS outperforms traditional PLS methods.Comment: 10 pages, 3 figure

    Pre-selection of Suitable Regression Methods for the Determination of Interactions and Forecasts in Global Production Networks

    Get PDF
    The locations of many manufacturing companies are distributed globally. This has led to the development of historically grown global production networks whose structure is often very complex, not transparent and influenced by many factors. The high number, as well as the volatility of the influencing factors and dependencies in the network additionally, complicate the network configuration. As a result, adaptation needs and optimization possibilities are recognized too late or not at all. In order to enable early recognition of saving potentials, active monitoring and analysis of changes and dependencies of the influencing factors on the production network is needed. The necessary consideration of a multitude of influencing factors requires further tools to be manageable by the network planner. Therefore, databased methods can be used as support for the forecast and the determination of dependencies of influencing factors. In other research fields, regression analysis is an established method for a databased analysis. This paper focuses on the use of regression analysis in global production networks. It is essential for an accurate analysis, to choose the right regression method out of the many different types in existence. A systematic literature review is conducted to establish an overview of regression methods used in other research fields. A search strategy is developed and implemented and the key findings of the literature review are derived and evaluated. In the second step, a new approach for the pre-selection of suitable regression methods for the determination of interactions and forecasts in global production networks is proposed
    corecore